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Abstract 

A word-valued source Y ~ Yi , ^2 , • • • is discrete random process that is formed by sequentially 
encoding the symbols of a random process X = Xi,X2,... with codewords from a codebook 
These processes appear frequently in information theory (in particular, in the analysis of source-coding 
algorithms), so it is of interest to give conditions on X and ^ for which Y will satisfy an ergodic 
theorem and possess an Asymptotic Equipartition Property (AEP). In this correspondence, we prove the 
following: (1) if X is asymptotically mean stationary, then Y will satisfy a pointwise ergodic theorem 
and possess an AEP; and, (2) if the codebook is prefix-free, then the entropy rate of Y is equal to 
the entropy rate of X normalized by the average codeword length. 
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I. Introduction 

The following notion of a word- valued source appears frequently in source-coding theory [1^]. 
Suppose that and M are discrete-finite alphabets and X = Xi , X2 , . . . is an -valued random 
process. Let be a codebook whose codewords take symbols from and have different lengths, and 
let / : J2/ — > ^ be a mapping. The word-valued source generated by X and / is the ^-valued random 
process Y = f{Xi), /(X2), . . ., which is formed by sequentially encoding the symbols of X with / and 
concatenating (placing end-to-end) the resulting codewords. 

It is of fundamental interest to give broad conditions on X, / and for which Y is guaranteed 
to possess an Asymptotic Equipartition Property (AEP). A common approach to this type of problem 
is to determine when the random processes of interest are stationary, after which the classic Shannon- 
McMillan-Breiman Theorem [5, Thm. 15.7.1] may be used to achieve an AEP. However, this approach 
is not particularly useful for word- valued sources: for most choices of / and Y will not be stationary 
- even when X is stationary. Thus, the primary focuss of this paper is to give broad conditions for an 
AEP without direct recourse to stationarity and the Shannon-McMillan-Breiman Theorem. 

Nishiara and Morita [1, Thms. 1 & 2] derived an AEP as well as a conservation of entropy law for 
Y when X is independent and identically distributed (i.i.d.), / is a bijection and is prefix-free. (A 
codebook is said to be prefix-free if no codeword is a prefix of another codeword [5, Chap. 5].) These 
results were later extended from the i.i.d. case to the more general stationary and ergodic case by Goto 
et al. in [2, Thm. 2]. We further generalize the results of [1,2] to the setting where X is Asymptotically 
Mean Stationary (AMS), / is a bijection and is prefix-free. (This AMS condition is a weaker version 
of the stationary condition that permits short-term non-stationary properties [6].) As we will see, the 
resulting AEP and entropy-conservation law do not retain the simphcity of those results reported in [1, 2] 
for stationary and ergodic X; namely, both extensions are ineluctably linked to an ergodic-decomposition 
theorem. 

In contrast to the aforementioned results for prefix-free codebooks, very little is know about word- 
valued sources generated by codebooks without the prefix-free property. In [1], Nishiara and Morita 
derived an upper bound for the sample-entropy rate of Y when X is an i.i.d. process and 'rf is not 
prefix-free. This upper bound was later supplemented with a non-matching lower bound by Ishida et al. 
in [4]. These bounds, however, fell short of proving an AEP. We prove an ergodic theorem as well as an 
AEP for Y when X is AMS and is arbitrary; and, in doing so, we resolve the open problem reported 
in [1,2,4]. 
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Our results will follow from a new lemma (Lemma 8) for AMS random processes. This lemma is 
an extension of a result by Gray and Saadat [7, Cor. 2.1], and it demonstrates that the AMS property 
is invariant to variable-length time shifts: an AMS random process will remain AMS when it is viewed 
under different time scales. This invariance property will, in turn, allow us to show that Y is AMS 
whenever X is AMS - no matter which / and is used. Finally, Gray and Kieffer's AEP for AMS 
processes [8, Cor. 4] will provide the desired AEP for Y. 

An outUne of the paper is as follows. We introduce some notation and definitions in Section n. We 
present an ergodic theorem (Theorem 1-A) in Section m, and in Section IV we restate this ergodic 
theorem using the language of AMS random processes (Theorem 1-B). We present an AEP (Theorem 2) 
in Section V. Finally, Theorems 1-B and 2 are proved in Sections VI and VII respectively. 

II. Dynamical Systems & Word-Valued Sources 

The notion of "time" is problematic for the development of word-valued sources. In particular, each 
symbol Xi, i = 1,2, . . ., will produce multiple symbols (a codeword) f{Xi)\ thus, X and Y are naturally 
defined by different time scales. We simplify notation for these different time scales by using various shift 
transformations to model the passage of time. A brief review of these transformations and the resulting 
dynamical systems is given in this section - a complete treatment can be found in [6] and [9]. After this 
review, we formally define word-valued sources. 

A. A Dynamical Systems Model for X 

Let us first introduce some notation. Suppose that ^ is a discrete-finite alphabet. For any natural 
number n (i.e. n G {1,2,...}), let 

a/'^ = S2f X X ■ ■ ■ X .S^ 

^ ' 

n 

denote the n-fold Cartesian product of jz/, and let^ a" = oi, a2, . . . , a„ denote an arbitrary n-tuple from 
(These notation conventions will apply to the Cartesian product of every discrete-finite alphabet 
used in this paper.) 

Now suppose that X = Xi , X2 , . . . is an ^ -valued random process that is characterised by a sequence 
of joint probability distributions 

pH(a") = Pr(Xi = ai, X2 = a2,..., Xn = an) , n = 1,2, . . . , (1) 
'when n = 1, we shall omit the superscript for brevity, e.g., a} = a and = s^. 
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for which the consistency condition 

p(")(ai,a2,...,a„) = ^p("+^)(ai,a2,...,an,a) , n = l,2,..., (2) 

is satisfied. Instead of characterising X with the sequence of joint distributions given in (1), we may use 
a dynamical system without loss of generality. A brief review of this fact is as follows. 

Let ^ = ^ X X ■ ■ ■ denote the set of all sequences with elements from and let x = xi, 2:2, • • • 
denote an arbitrary member of ^ . Now let 

[a"] = {x G =^ : xi = ai, a;2 = 02, • • • , x„ = a^} 

denote the cylinder set determined by an n-tuple a" G and define to be the cr-field of 

subsets of ^ that is generated by the collection of all cylinder sets. Let : ^ =^ be the left-shift 
transform that is defined by TV(x) = X2-,xz-, For integers n > 0, let^ 

T^(x)=T^(T^(...r^(x)...)) 



denote the n-fold composition of T^, and let 

r^M={xG^ : Tl{^)eA] 

denote the preimage of an arbitrary set ^ G under T^. Finally, consider the partition Q = {[a] : 

a G s^} of and define the function Xq : ^ ^ hy setting Xq(x) = a if x G [a]. I.e. ^q(x) 
returns the value of the first symbol, xi, from x. 

Proposition 1 ([6,9]): IfX. is an -valued random process that is characterised by a distribution (1) 
for which the consistency condition (2) holds, then there exists a unique probability measure p, on 
, ,^{^)) such thatp^^\a^) = p{[a'^])for every tuple G i/" and every n = 1,2, .... In particular, 
the distribution of the sequence of ^-valued random variables Xq o T^, n = 0,1,..., defined on 
p) matches that o/X; 

m|^{x g : Xa(x) = ai, Xq(T^(x)) = a^, . . .,Xq{T^-\^)) = a„} j = fi (^f]T^'+'[ai]j = . 

The probability measure ji is called the Kolmogorov measure of the process X. 

Proposition I shows that the quadruple (^, ^{^), p, T^) may be used in place of X without loss 
of generality. We shall use ^{^), p, T^) and X interchangeably. 

^If n = 0, define (x) = x. 
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B. A Dynamical System Model for Y 



Suppose that ^ is a discrete-finite alphabet, iV is a natural number, and 

N 



i=l 



is the set of all ^-valued tuples = bi,b2, ■ ■ ■ ,bi whose length i is greater than or equal to 1 and no 
more than N. Let j : ^ SB* be a mapping and ^ = Range{f). Finally, let c denote an arbitrary 
member of ^ and |c| its length. We call / a word function, ^ a codebooP, and c a codeword. 

Definition 1 (Word-Valued Source): Suppose that X is an ^/-valued random process and / is a word 
function. The word- valued source Y generated by X and / is defined to be the ^-valued random process 
that is formed by: 

(i) sequentially coding the symbols Xi, i = 1,2, . . . , with /, and 

(ii) concatenating the resulting sequence of codewords: Y = f{Xi), /(X2), /(^s), 

For arbitrary /, the particular realisation of X may not be uniquely determined by observing Y. The 
following definition describes a class of word functions where X can be uniquely recovered from Y. 

Definition 2 (Prefix-Free Word Function): A word function / is said to be prefix free if: 

(i) / : ^ ^ is a bijection, and 

(ii) there does not exist two codewords c and c/ in ^ such that Cj = for i = 1, 2, . . . , min{|c|, |c'|}. 
The distribution of the word-valued source Y, 



may be calculated by combining the distribution of X with /. With a slight abuse of notation, let /~ 6" 
denote the set of n-tuples a" where the first n symbols of the n concatenated codewords /(ai), 7(02), 
. . ., /(a„) are equal to 6"^; that is. 



where <pn ■ ^n<m.<nN-^"' -> is the projection defined by ^n(^i, ^>2, ■ ■ ■ , i>n, K+i, ■ ■ ■ , bm) = bi, 
62, • • • , bn- Using this notation, we have that 



^By construction, we have that the length |c| of each codeword c e ^ is bound by 1 < |c| < iV. In practice, however, the 
restriction to codewords with finite length may not be suitable for all applications [1]. 



q{n) (j^n^ ^ p^^Yi = bi,Y2=b2,...,Yn = b, 



'n 



) , n = 1,2,... , 



rV = {a-G^- : </.„(/(ai),/(a2),...,/(a„))=6"} , 




(3) 
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where denotes the empty set. 

Describing Y directly with (3) is rather cumbersome, and it is more convenient to use a dynamical 
system that is formed by coding {^X , ^{,3^), fi, T^) with a sequence-to-sequence coder. To this end, 
let ^ = ^ X 3§ X ■ ■ ■ denote the collection of all sequences with elements from let b = 6i , 62 , • • • 
denote an arbitrary member of and let ^(^) be the cr-field of subsets of ^ generated by cyHnder 
sets. Now consider the sequence-to-sequence coder (measurable mapping) F : ^ '¥ that is formed 

by setting -F(x) = j{x\), jix-i), When F acts on the abstract probability space [i), 

it induces a probability measure 77 on , ^ (^&)) [10, Ex. 9.4.3] [9, Pg. 80]. In particular, t] and are 
related by 



where F'^A = {x e ^ : F(x) e A} denotes the preimage of a set A G ^(^) under F. Finally, 
when ,^(^),77) is combined with the left-shift transform T^{y) = y2,y3 - - and the partition 
{[b] : 6 G ^} of the result is a dynamical system model (^, ,^{'3^), 77, T^) for Y. In particular, 
for each n = 1, 2, ... and 6" G we have that r?([6'^]) = = {b''). 

Throughout the remainder of this paper, we shall use the following notation: (^, ^(^), fi, T^ ) 
and X will denote an arbitrary j2/-valued random process; / : =2/ ^ will denote a word function; 
F : 3^ ^ IV will denote the sequence-to-sequence coder generated by /; and, (^^, ^(^^), 77, TV) and 
Y will denote the word- valued source generated by coding (^, ^(^), ^i, T^) with F, where and rj 
are related via (4). In addition, we will use {W, ^{W), p, T) to represent an arbitrary dynamical system. 
Here it should always be understood that is the sequence space corresponding to some discrete-finite 
alphabet (an element of which will be written w = wi,W2, ■ ■ ■); ^{W) is the cr-field generated by 
cyUnder sets; p is a probabiHty measure on (W,^{W)); and, T : 1^ — >^ is an arbitrary measurable 
mapping. When we are explicitly interested in the special case where T is the left-shift transform, we 
shall use the notation r^(w) = ^2,^3, 



r]{A) = fi{F-^A) , A G ^(r) , 



(4) 



III. A POINTWISE ERGODIC THEOREM 



Theorem 1-A: 



(i) If the limit 




(5) 



i=0 



April 24, 2009 



DRAFT 



7 



exists almost surely with respect to (a.s. [/J,]) for every bounded-measurable g : ^ ^ (—00, 00), 
then the limit 

^ m—l 

{~g){y)= lim -^~g{T^(y)) (6) 

m— >oo m ' 

j=0 

exists a.s. [rj] for every bounded-measurable g : 3^ ^ {—00, 00). If f is prefix-free, then the reverse 
implication also holds. 

(ii) If the limit (5) exists and takes a constant value a.s. [jx] for every bounded-measurable g : ^ ^ 
(—00, 00), then the limit (6) exists and takes a constant value a.s. [r]] for every bounded-measurable 
g ^ (—00, 00). 

IV. Asymptotically Mean Stationary Random Processes 

Theorem 1-A may be restated in a more compact form using the language of asymptotically mean 
stationary random processes. For this purpose, let us recall the following definitions from Gray [6]. 

Consider a dynamical system ^(W), p, T), where T : >^ ^ is an arbitrary measurable 
mapping. The system is said to be stationary if p{A) = p{T^^A) for every A € ,!^{W). A set A € .!^{W) 
is said to be T-invariant if ^ = T~^A. The system is said to be ergodic if p{A) = or 1 for every 
T-invariant set A. Finally, the system is said to be Asymptotically Mean Stationary (AMS) if the hmit 

1 n— 1 

lim -Vpfr-M) 

1=0 

exists for every A G ^{W), in which case the set function 

^ n— 1 

p(A) = lim -YpiT~'A) , A e ^(W), 

n— >oo n 

1=0 

is a stationary probabiUty measure on {W,,j^{W)); that is, the system {W, ^{W), p, T) is stationary. 
The measure p is called the stationary mean of p. 

For brevity, we will say that the measure p is T-stationary / T-ergodic / T-AMS if the corresponding 
dynamical systems is stationary / ergodic / AMS respectively. The next lemma gives necessary and 
sufficient conditions for a system to be ergodic and AMS. 

Lemma I: 

(i) The system {W , ^{W), p, T) is AMS if and only if the limit 

-. n—l 

(5)(w)= lim -J29{T'{w)) (7) 
exists a.s. [p] for every bounded-measurable g : W ^ (—00, cxd). 
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(ii) The system [W, ^{W), p, T) is ergodic if and only if the limit (7) takes a constant finite value 
a.s. [p] for every bounded-measurable g -.W ^ (—00, 00). 

The AMS component of Lemma 1 was proved by Gray and Kieffer [8, Thm. 1], and the ergodic 
component follows from the definition of ergodicity [6, Sec. 6.7]. Using Lemma 1, we may restate 
Theorem 1-A as follows. A proof of this result can be found in Section VL 

Theorem 1-B: 

(i) If p is T^-AMS, then 77 is T^-AMS. 

(ii) If f is prefix-free, then rj is T^-AMS if and only if p is T^-AMS. 

(iii) If p is T^-ergodic, then 77 is T^-ergodic. 

V. An Asymptotic Equipartition Property 

In this section, we extend the AEP of [1,2,4] to the setting where p is Tjf-AMS and / is arbitrary. 
Two fundamental features of this extension will be the ergodic-decomposition theorem and the AEP for 
AMS random processes. We briefly review each of these ideas in Subsections V-A and V-B before stating 
our main results in Subsection V-C. 

A. The Ergodic Decomposition Theorem 

Suppose that W = W2, ... is a discrete-finite alphabet random process and {W, :^{W), p, T-^-) 
is the corresponding dynamical system in the sense of Proposition 1, where T^(w) = W2,ws, . . . is the 
left-shift transformation. For each set A G ^{W), let 1a denote its indicator function: 

1, if w G A 



1a(w) 
When the Umit exists, let 



0, otherwise. 



(1a)(w)= lim -Vl^feCw) 

n— >cxD fi ^ — ' \ 

i=0 

denote the relative frequency of the set A in the sequence w. Finally, for each bounded-measurable 
function g : W —>■ (— cxd,oo), let IE[p, 5] denote its expected value: 



^[p,9] = J 5(w) dp(w) . 



The pair {W, ^{W)) belongs to a family of measurable spaces called standard spaces [6, Chap. 2]. A 
distinctive property of these spaces is that they possess a countable generating field [6, Cor. 2.2.1]. Let 
^ be a countable generating field for (/^, ^{W)). Now let G{S^) denote the collection of sequences w 
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from W such that the limit (lyi)(w) exists for every generating set A € .5^ . It can be shown that, for each 
w € G{y), the set function Pw obtained by setting P^{A) = (l^)(w) induces a unique T^^ -stationary 
probability measure on {W,^{W)). Let E denote the set of sequences w from G{S^) where the 
induced T^-stationary probability measure is also T^-ergodic: 

E= {we'^ : we G{y) and pw is T^-ergodic} . 

The set E is called the set of ergodic sequences. Finally, let p* be an arbitrary T^^-stationary and 
T^-ergodic probability measure on {'W,^{W)), and for each sequence w eW define 

{Pw, if w G £; 
p*, otherwise. 

The collection of probability measures {p^ : w G is called the ergodic decomposition of (/^, ^{W)). 

Lemma 2 {AMS Ergodic Decomposition Theorem [6, 9]): Let {p^ : w G W} be the ergodic decom- 
position of [W , J^{W)) and E the set of ergodic sequences. Then, 

(i) the set E is Ty^^ -invariant: E = T^^E, 

(ii) Pw(^) = Pt-^(w)(^) fa^ every set A G ,'^{W) and every sequence w G y^, 

(iii) for any pair w and w', the probability measures and 'p^, are either identical or mutually 
singular 

Additionally, if p is T-AMS with stationary mean 'p, then 

(iv) p{E) = p{E) = 1, 

(v) for each set Ae^(W) 

P(^) = J Pw(^) dpiw) , 

(vi) the limit 

^ n—l 
1=0 

holds a.s. [p] for each bounded-measurable function g :W ^ (— oOj oo)- 

B. An AEP for AMS Random Processes 

As before, suppose that W = Wi, W2, ... is a discrete-finite alphabet random process and {W, 
p, T-p-) is the corresponding dynamical system. For each sequence w G W, the probabiUty p([tt;'^]) is 
non-increasing in n. If p is T^^-AMS, then Gray and Kieffer's AEP [8] asserts that this decrease is 
exponential in n on a set of probability one; in particular, the (asymptotic) rate of decent is given by the 
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entropy rate of the underlying T^^-stationary and T^-ergodic probability measure from the ergodic 
decomposition theorem. A formal statement of this idea is given in the next lemma. However, before this 
lemma is given, we briefly review the concepts of joint entropy, entropy rate and sample-entropy rate. 
The joint entropy H{W"') of the first n-random variables from W is defined as [5] 



g(ir-) = ^Pr[»-- = «,"]log ^ . 

L J 



With respect to the Kolmogorov measure p, we define the joint entropy of the first n random variables 
to be 



i?n(p) = i;^(K])i°g;^ 



From Proposition 1, these functional are consistent in that H{W"') = Hn{p). When the limit exists, the 
entropy rate of W is defined as H{'W) = limn^oo{l /n)H{W"') [5, Chap. 4]. Similarly, we define the 
entropy rate of W with respect to p to be H{p) = lim„^oo(l/n)i7„(/9) when the limit exists. Finally, 
we define the sample-entropy rate of a sequence w eW with respect to p as 

h(p, w) = lim — log 



n-»oo n pQu;"]) ' 
when the limit exists. 

Lemma 3 (Asymptotic Equipartition Property [10]): Let {p^ : G W} be the ergodic decomposition 
of ,,'^{W)). If p is Ty^-AMS with stationary mean p, then there exists a set f2 G ^{W) with 
probability p{Q) = 1 such that the sample-entropy rate h{p, w) of any sequence w E Q, exists and is 
given by 

h{p, w) = (p{w) , (8) 

where ip is the T-g^ -invariant function that is defined by (/?(w) = H{p^). Furthermore, the entropy rate 
of p exists and is given by 

Hip) = Hip)=lK[p,^] . 
Finally, if p is T^-ergodic, then h{p,w) = H{p) = H(j)) for every w G Jl. 

C. An AEP for Word Valued Sources 

We now return to the problem of establishing an AEP for Y. From Theorem 1-B and Lemma 3, it 
is clear that Y satisfies an AEP whenever p is T^-AMS. It turns out, however, that not only does the 
limit h{r], y) exist almost surely, but its value may also be bound from above by the entropy rate of X 
normalized by the expected codeword length. We formalize this idea in the following theorem. 
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Theorem 2: Let {ji^ : x £ ^} be the ergodic decomposition of If fi is Tr^-AMS, then 

r] is Tay-AMS and there exists a set Vt^ G ^{^) with probability p{^x) = 1 such that, for every sequence 
X G ^x> the sample-entropy rate h{r],F(x.)) of the word-valued sequence -F(x) = /(xi), f{x2), ■ ■ ■ exists 
and is bound from above by 

/i(r7,F(x)) < 4^ , (9) 

where I : ^ ^ {1, 2, . . . , A''} is given by /(x) = |/(xi)|. In addition, if f is prefix free, then the 
inequality in (9) becomes an equality. 

A proof of Theorem 2 follows in Section VII. The next corollary demonstrates that if X is AMS, then 
the entropy in each symbol of X is conserved with respect to each stationary and ergodic sub-source 
from the ergodic-decomposition theorem. This behaviour is consistent with the entropy-conservation laws 
of variable-to-fixed length source codes [11, 12]. 

Corollary 2.1: If p, is Ta^-AMS, then the entropy rate of rj exists and is bound from above by 

H{v) < I dp{x) . (10) 

J E[p^,l\ 

In addition, if f is prefix-free, then the inequality in (10) becomes an equality. 

Finally, the next corollary resolves the open problem reported in [1, 2, 4]: if X is stationary and ergodic, 
then an AEP holds for Y. 

Corollary 2.2: If p is Ta^ -stationary and T^-ergodic, then rj is T^-ergodic and 

In addition, if f is prefix-free, then the inequality in (11) becomes an equality. 

VI. Proof of Theorem 1 

The proof of Theorem 1-B (and Theorem 1-A) will use Lemmas 4 through 9, which are given 
respectively in Subsections VI-A through VI-E. The forward and reverse imphcations of Theorem 1- 
B are proved in Subsections VI-F and VI-G respectively. 

A. Subsequences, Weighted Sequences & Density 

Suppose that C = Co> Ci' C2, • • • is a strictly increasing subsequence in the non-negative integers Z* = 
{0, 1, 2, . . .}. Let ^ = ^0, ii, ^2, ■ ■ ■ be the weight sequence obtained from C, by setting 

1, if n = Cfe for some k = . . . 

^n={ (12) 

0, otherwise. 
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When the Umit exists, the density of ( in Z* is defined as 




^ n—1 

n ^-^ 

i=0 



(13) 



The next lemma follows directly from these definitions, e.g., see [13, Prop. 1.7]. 

Lemma 4: Suppose that ( is a strictly increasing subsequence in Z* with density c/^ > and weight 
sequence ^. For any sequence r = ro, ri, . . . of real numbers, we have that 



that is, the existence of either limit implies the existence of the other. 

B. Invariant Sets & Asymptotic Mean Stationarity 

The next lemma gives some equivalence conditions for AMS dynamical systems. 

Lemma 5 (Cor. 6.3.4, [6]; Thm. 2.2, [14]): For a dynamical system [W , p, T), the following 

statements are equivalent: 

(i) p is T-AMS. 

(ii) There exists a T-stationary probability measure p on [W, ,^{ W)) such that p asymptotically 
dominates p; that is, p{A) = implies lim„_^oo p(T~'^A) = . 

(iii) The limit lim^^co (1/^) 'E7=o 9{T'^) exists a.s. [p] for every bounded-measurable g : W ^ 

(— oo, oo). (See also Lemma L) 

(iv) There exists a T-stationary probability measure p on {W,^{W)) such that A = T~^A and 
pi^A) = together imply that pi^A) = 0. 

C. Stationary, Ergodic &: AMS Sequence Coders 

In Section 11, we defined the word-valued source (^, 77, T^) using a sequence coder F : 

=^ ^ In the proof of Theorem 1-B, it will be necessary to determine when such a sequence coder 
will transfer stationary / ergodic / AMS properties from the input to the output. For this purpose, we 
now review the notions of stationary, ergodic and AMS sequence coders. 

Suppose that [W, ^{W), pa, T^) and (^ , ^(^), pp, Tp) are dynamical systems, where W and 
are sequence spaces corresponding to some discrete-finite alphabets; ^{W) and ^('^ ) are cr-fields 
generated by cylinder sets; T^ :W ^ W and T/3 : ^ ^ ^ are arbitrary measurable maps; G ■.W ^ ^ 
is a sequence coder; p^ is a probability measure on (y^, ^(W)); and, pp is induced by G 




k-l 



n—1 



j=0 



i=0 



Pp{A)=p4g-'A) , Ae^i^) . 
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The sequence coder G also induces a probability measure paj3 on the product space"* [W 
^('^)) via 

p^p{AxB)=p^{A^G-^B), A^^{W), B e ^{^) . 

The two shifts Ta and together define a product shift T^p : W x ^ W x yia rQ^(w, u) = 
(To (w) , Tfs (u) ) . The combination of pap and Tap yields a dynamical system x ^(^) x ^('^), 

The sequence coder G is said to be {Ta, Tp)-stationary I {Ta,Tg)-ergodic I (Tq,, Tjj)-AMS if, for any 
T^-stationary / T^-ergodic / Tq, -AMS probability measure Pa, the induced measure paj3 is Ta/j-stationary 
/ T„;3-ergodic / T^/j-AMS. 

Lemma 6 (Ex. 9.4.3, [10]): A sequence coder G is {Ta,Tp)-stationary if and only if G(rQ,(w)) = 
T>(G(w)). 

Lemma 7 (Lems. 9.3.2 & 9.4.1, [10]): IfG is {Ta,Tp)-stationary, then G is also {Ta,Tp)-ergodic and 
{Ta,Tp)-AMS. 

We note in passing that the sequence coder F generated by the word function / is not (T^,T(y)- 
stationary. Thus, Theorem 1-B does not follow directly from Lemma 7. The additional result needed to 
prove Theorem 1-B is given in the next section. 

D. AMS Processes & Variable Length Shifts 

Suppose that W is a discrete-finite alphabet random process and {W, ^{W), p, T-^) is the corre- 
sponding dynamical system, where T'^iy/) = W2,W3,... is the left-shift transform. Now, suppose that 
iV is a natural number and W is parsed into a sequence of non-overlapping blocks of length N to 
form the block-valued process = {{WnN+i,WnN+2, • • • , ^(n+i)Af); = 0, 1, . . . }. I.e. is 
simply W viewed in blocks of length N. The appropriate shift transform for is the A^-block shift 
T^N -.W^Wof Gray and Kieffer [8] (see also Gray and Saadat [7]), which is defined by 

7Viv(w) = r^(w) = WN+l,WN+2, ■■■ ■ 

The following proposition shows that the AMS property transcends block- time scales. 

"We use ^{W) x ^('^) to denote the product <7-field induced by rectangles of the form AxB,Ae ^{W), B G ^(-^r) [15, 
Pg. 97]. 
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Proposition 2 (Cor. 2.1, [7]): If p is T^^-AMS for any natural number N, then p is T^m-AMS for 
every natural number M. 

Proposition 2 does not have analogues for stationary and / or ergodic random processes; it is a unique 
property of AMS random processes. We now extend this proposition to include the more general notion 
of "variable-length" parsing, which will be necessary for our study of word-valued sources. 

Suppose now that W is parsed into a sequence of non-overlapping blocks, where the length of each 
block is determined by a simple function 7 : W {1,2... ,A^}. The appropriate transform for this 
variable-length parsing is the variable-length shift of Gray and Kieffer [8, Ex. 6]. 

Definition 3 (Variable-Length Shift): Suppose that 7 : >^ — {1, 2, . . ., TV} is a simple measurable 
function and that there exists a natural number M such that 7(w) = 7(w') for every pair of sequences 
w, w' G with Wi = w'j^ for every i = 1,2, ... ,M. The variable-length shift : W —>■ W generated 
by 7 is defined by [8] 

T^^A"^) = r^'^Hw) = W^T,(w)+l, ^'^7(w)+2, • • • • 

Our extension of Proposition 2 is given in the next lemma. This lemma will be the centrepiece of our 
proof of Theorem 1-B. 

Lemma 8: If p is T^-, -AMS for any variable-length shift T-^^ : W 'W , then p is -AMS for 
every variable -length shift T^x '■ ^ 

We note that Gray's proof of Proposition 2 [6, Sec. 7.3] elegantly combines convergent subsequences 
with the notion of asymptotic dominance. It is not clear if this argument can be extended to prove the 
more general Lemma 8. Instead, we take a more laborious approach and prove the lemma by showing 
an ergodic theorem and applying Lemma 5 (hi). 

Proof: We first show that if p is T^^-AMS, then p must also be T^-AMS. We then show that if 
p is T^-AMS, then p must also be T^x-AMS. 

Assume that p is T^^-AMS. From Lemma 5 (iv), there exists a -stationary probability measure 
on {W, ^(W)) such that T^\A = A and p^{A) = together imply that p{A) = 0. Using the procedure 
given by Gray and Kieffer in [8, Ex. 6], it can be shown that ^ is also T^^-AMS. A second application 
of Lemma 5 (iv) shows that there exists a T^r -stationary probability measure 'p on {'W, ^{W)) such that 
T^^A = A and p{A) = together imply that p' (A) = 0. Note also that if a set A is T^-invariant, then 
it is also -invariant: A = T^^A =^ A = T^\A. On combining these facts, we have the following: 
if A = T^^A and -p{A) = 0, then it must be true that ^(A) = 0, A = T:^\A and p{A) = 0. Thus, 
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we have demonstrated the existence of a T^-stationary probability measure p on (W ^ such that 

T^^A = A and 'piA) = together imply that p(A) = 0. A third application of Lemma 5 (iv) shows that 
p must indeed be T^-AMS. 
We now show: if p is T^-AMS, then p must also be T^^-AMS. To do this, it will be useful to identify 

the orbit^ of T^x on each sequence w £ W with a time subsequence ( = Co, Ci) Namely, for each 

n = 0, 1, . . . set (n to be 

0, ifn = 



Cn 

SO, by construction, we have that 



^ Er=o a(t>.(w)), ifn>l, 



T^,(w) = w^^+uw^„+2, ... = Ti?(w) . (15) 

Let ^ = ^0)Ci) ... be the weight sequence that corresponds to as given by (12). Since the length of 
each shift is at most N, the density of ( in Z*, as given by (13), can be no smaller than 1/N (when 
the Umit exists). 

Let ^ denote the collection of all sequences with elements from {1, 2, . . . , A^}, let J^(^) be the 
cr-field on ^ generated by cylinder sets, and let Ti^{u) = U2,U3,... be the left-shift transform. Let 
A : W ^ he the mapping defined by 

A(w) = A(w), X{T^iw)), X{T^{w)), ... . 

From Lemma 6, this mapping is (T^ , 7^^/) -stationary since To^{A{w)) = A(r^(w)). Finally, from 
Lemma 7 the induced measure Pu,„(A x B) = p{A n A'^B) on (W x , ^{W) x ^{^)) is T-g/^^- 
AMS, where Tr^^(w,u) = (T;r(w),r^(u)). 

Let ^ denote the collection of all sequences with elements from {0, 1}, let be the cr-field 

generated by cyUnder sets, and let Ta^{T) = Z2,zs,... be the left-shift transform. We now construct a 
finite-state coder G : W x 3f, which identifies the orbit of the variable-length shift T^^. Define 

= {0, 1, . . . , AT — 1} to be the internal state space of the coder, and define the state update function 
gs and the output function go by 



gs{w,u, s) 
go{w,u,s) 



u — 1, if s = 
s — 1, otherwise. 

1, ifs = 
0, otherwise. 



'The orbit of T^x on w is the sequence of points w, r^x(w), T^x(w), . . . from W . 
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Set si = and calculate the first output zi = go{tvi,ui,0) = 1. Update the state S2 = gs{wi,ui,0) = 
Ml — 1 and determine the next output Z2 = go{w2, U2,ui — 1). Continue in this fashion to obtain the finite 
state coder G : W x ^ As with sequence coders, the finite-state coder G is measurable and it 
induces a probability measure 

Py,uz{A xBxC)= Pu,u{{A X 5) n G-^C) 

on X X ^, ^{W) X ^{'^) X Moreover, this finite state coder is an example of a 

one-sided Markov channel [16], so it follows from^ [16, Thm. 6] that pwuz is T-^^^-AMS, where 

7V^^(w,u,z) = (r^(w),r^(u),r^(z)). 

Consider the set 

T = {(w,u,z) -.weW, u = A(w), z = G(w,A(w))} 

It can be shown that T is measurable and PwuzC^) = 1- Suppose (w, u, z) eT,( is the time subsequence 
from (14), and ^ is the weight sequence corresponding to (.If Ix : W x 3f ^ {0, 1} is the indicator 
function defined by 

1a(w,u,z) = 



1, if zi = 1 
0, otherwise. 



then, by construction, we have that 

= lA(T'Wir(w,u,z)) (16) 
for alH = 0, 1, 2 Moreover, the density of ( is given by (if the limit exists) 

^ n—l 

= lim - 

i=0 



= J™o i ^ M^^^^i"^' Z)) 
1=0 

= (1a)(w,u,z) . (17) 

Finally, since the length of each codeword is no more than L, it must be true that d|j > 1/L (when this 
limit exists.) 

Since p^uz is Ty^^ ^-AMS, it follows from Lemma 5 (Mi) that there exists a subset Q, with probability 
Pwuzi^) = 1 such that, for each (w, u, z) G O, the limit 



^ n— 1 

{g){w,u,z) = lim - V 5(r^^^(w, u, z)) 



1=0 

^Example (b) from [16] demonstrates that a finite-state coder is a special case of a one-sided Markov channel. 
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exists for every bounded-measurable g. Since 1a is bounded and measurable, this ergodic theorem 
guarantees the density (17) exists for every (w, u,z) £ QCiT. 

Let T-^'^ j°A denote the variable-length shift on the product space x ^ x y defined by 

T-^^^x (w, u, z) = T^^^^{w, u, z) . 

From (14), we have that T^^^x (w, u, z) = T^^^{w, u, z) for all = 0, 1, 2 

If^iy^x'^ xJ^— > (— oo, oo) is bounded-measurable, then 1^ x 5 is bounded and measurable, and 
for each (w, u, z) G O fl T the following limits will exist: 

^ n— 1 

(1aX5()= lim - V lA(r^^^(w,u,z))5(r^^^(w,u,z)) 

1=0 
^ n— 1 

= lim -^Ci5(r^^^(w,u,z)) (18) 

i=0 

^ m—1 

= dc lim - V5(r^^^(w,u,z)) (19) 

j=0 
^ m—1 

= (i^ lim — V 5(r^^^;,(w,u,z)) , (20) 

j=0 

where (18) follows from (16), (19) follows from Lemma 4, and (20) follows from (14). This chain of 
equaUties guarantees the limit in (20) exists for every (w, u, z) G J7 n T. Since 5 is an arbitrary bounded 
measurable function, it follows from Lemma 5 (iii) that pyjuz is T-^^^ ^\-AMS. Finally, since p is a 
marginal of Pwuz, it follows that p is T^x-AMS. ■ 

E. Ergodic Processes Sz Variable Length Shifts 

In Lemma 8, it was shown that an AMS random process remains AMS under all variable-length 
time shifts. The next lemma proves a weaker result for ergodic processes. Again, suppose that W is a 
discrete-finite alphabet random process and {W, ^{W), p, T-^) is the corresponding dynamical system. 

Lemma 9: If p is T^^-ergodic for some variable-length shift : W ^ W , then p is also T-^- 
ergodic. 

Proof: If p is T^^-ergodic and A is an T^^-invariant set, then p{A) = or 1. Since A = T^^A 
implies that A = T^\A, it follows that p{A) = or 1 for every -invariant set A. ■ 
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F. Proof of Theorem 1-B (Forward Claim) 

We now prove the forward claim of Theorem 1-B: if /x is T^-AMS (and Tgr-ergodic), then r] is 
T^-AMS (and T^-ergodic). Let ^ denote the set of all sequences with elements from {1,2, . . . ,N}, 
let denote the cr-field generated by cylinder sets, and let r^(z) = 2:2, z^,. . . denote the left-shift 

transform. Using the word function /, define the mapping 

/» = (/(x)2, - 1), ... , 1) , 

where f{x)j, 1 < j < \f{x)\, denotes the j*'* symbol of the codeword f{x). By construction, f{x) couples 
the codeword f{x) with a sequence of indices |/(x)i| — 1, . . . , 1, which mark the distance from 

the current symbol to the end of the codeword. Using /, define the sequence coder F : ^ ^ x if via 
F(x) = /(xi), /(X2), .... As before, this sequence coder induces a probability measure r]yz{A x B) = 
n{F-^{A X B)) on x iT, ^(^) x ^(iT)). Let r^,.^(y,z) = (r^(y), r^(z)), and let T^^-, be 
the variable-length shift defined by setting 7(y,z) = zi. Since 

F(rr(x)) =rr^.(F(x)) . 

it follows from Lemma 6 that F is a (T^ , Tg^ j°-y )-stationary sequence coder. Since /x is Tgr -AMS 
(and T^-ergodic), we have from Lemma 7 that rjyz is T^^i-AMS (and r^5»7-ergodic). Finally, from 
Lemmas 8 and 9, we can see that rjyz must also be r^j»-AMS (and r^j»-ergodic); therefore, rj must 
be r^-AMS (and T^-ergodic). 

G. Proof of Theorem 1-B (Reverse Claim) 

We now prove the reverse claim of Theorem 1-B: if t] is T<y-AMS and / is prefix-free, then is 
Tjr -AMS. Define the variable-length shift T^^ : ^ ^ ^ by setting 

f 

|c|, if there exists a unique c G ^ such that yi = Ci 
l{y) = { foralH = 1,2, ...,|c|. 
1, otherwise. 
From Lemma 7, it follows that 77 is -AMS. 
Define 

O = |y G ^ : there exists x G such that y = F(x)} , 
where it can be shown that Q, G ^(^) and 77(0) = L 
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Let g ■.'^ ^ denote the inverse of /. If y is in 17, then there exists a unique sequence of codewords 
ci,C2,... from such that y = ci,C2, — Therefore, using g, we may define the sequence-coder 
G : ^ ^ by setting G(y) = ^-^(ci, C2, . . .) = ff(ci), 5(02), . . 

For each y G we have that G(T^T(y)) = 2V(G(y)) , so it follows from Lemma 6 that G is a 
(T^^, T2r)-stationary sequence coder. From Lemma 6, the induced probabiUty measure /i(^) = r^iG'^A) 
on {3C,^{3C)) is T^-AMS. Since = J]{G-^A) = n{F-^G-^A) = n{A) for each A e ^{SC), 
it follows that is T^-AMS. ■ 



A. Proof of Theorem 2 

Let {/i^ : X e J"} and {r?y : y G ^} be the ergodic decompositions of (^, ^(^)) and (^, ^(^)) 
respectively For each n = 1, 2, . . ., let (/>„ : ^ be the projection 0n(y) = yi,y2, • • • , Vn- From 
Lemma 3, there exists a subset 17^; 1 G ^(.f^) with probability /u(r2a:^i) = 1 such that the sample-entropy 
rate of each sequence x G i},x,i exists and is given by x) = (pxi'x.), where (px{'^) = Hijl^. Similarly, 
there exists a subset G ^{'3^) with probability r/(f2y) = 1 such that the sample-entropy rate of each 
sequence y G Oj, exists and is given by /i(?7,y) = ^y{y), where ^y{y) = H(rjy). Finally, from Lemma 2 
there exists a subset Clx^2 S ^(^) with probability /x(r2j;^2) = 1 such that for each sequence x G flx,2 
the time-averaged codeword-length exists and is given by 



For each n = 1,2, . . ., we have that F ^[(^^^(F(x))] D [x"], with set equality if / is prefix free. This 



VII. Proof of Theorem 2 & Corollaries 



^ n ^ n— 1 

lim -J2\f{xi)\= lim -^^(r^(x))=E[7Z,,/] . 

i=l i=0 



For each x G define the time subsequence C = Coj Ci) • • • by setting 




implies 




(21) 



with equahty if / is prefix free. Furthermore, 

-7- log2 —TT- 




1 



n 



(22) 



,n 



is a subsequence of 




(23) 
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thus, if X G F ^Q,y, then (22) and (23) both converge to ipy{F{-x)) as n ^ oo. To complete the proof, note 
that Theorem 2 follows from (21) since lim„^oo Cn/n = E[7Zx, I], Hm„^oo —(1/^) log2 /^([a^"]) = H{]I^) 
and lim„_»oo -(!/") log2 vii^PCn (-^(x))]) exists for every x G Q^^i n ^^,2 n F-^Qy. ■ 

B. Proof of Corollary 2. 1 

Let : X G and {r?y : y G ^} be the ergodic decompositions of ( JT, ^{^)) and 
respectively. As usual, define (pxi'x) = Hijl^ and <Py{y) = H{f]y). Now define ipxi'x.) = fyi^i'^)) 

^(^^ = ^Bi ■ 

Suppose n is T^-AMS. From Theorem 2, we have that rj is T^-AMS and ifxi^) < ^(x) on a set fix 
of probability /Lt(ria;) = 1 (with equality if / is prefix -free). Therefore, 

J ^yi) d/x(x) < J 5(x) d/x(x) . (24) 

Note, the R.H.S. of (24) is equal to the R.H.S. of (10). By the change of variables formula [6, Lem. 
4.4.7] and Lemma 3, we have 

J <pxix) d/x(x) = J ipyiy) driiy) =H{ri) . (25) 

which is the desired result. ■ 

C. Proof of Corollary 2.2 

Suppose that n is T^-stationary and -ergodic. From Theorem 1-B, r) is T^-ergodic. From Lemma 3, 
there exists a subset G ^{^) with probability r){^y) = 1 such that the sample-entropy rate of each 
sequence y G takes the same constant value h{ri,y) = Hirf). From Theorem 2, there exists a subset 
^x € ■^{S^) with probability ijl{^x) = 1 such that the sample-entropy rate of each coded sequence 
F(x), X G 0.x, exists and is bound from above by 

Mr?,F(x))<^^^. (26) 

Since F~'^Q.y n Q^; / 0, there exists x G and y G ily such that y = F(x) and 

where the R.H.S. equality in (27) follows from the fact that ji is Tjk -stationary and Tgf -ergodic. The result 
follows since h{rj,y^ exists and takes the constant value H(r]) on Q,y. Finally, note that for prefix-free 
codes (26) and therefore (27) are equalities. ■ 
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